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ABSTRACT 
This  paper  is  based  on  the  multidimensional  scaling 
technique  of  Joseph  B.  Kruskal.   It  is  comprised  of  three 
parts:   The  first  part  describes  Kruskal' s  objectives  and 
introduces  his  goodness  of  fit  measure,  called  stress;  the 
second  part  discusses  some  problems  associated  with  Kruskal ' s 
technique,  focusing  on  the  concept  of  stress;  in  the  third 
part,  an  alternate  goodness  of  fit  measure,  called  V,  is 
proposed,  together  with  a  different  procedure  for  doing  multi- 
dimensional scaling.   Part  three  also  includes  a  discussion 
of  the  superiority  of  V  over  stress  as  a  goodness  of  fit 
measure. 
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I.   MULTIDIMENSIONAL  SCALING 

Like  all  statistical  techniques,  multidimensional  scaling 
is  a  method  of  summarizing  and  drawing  inferences  from  a  large 
body  of  data.   In  this  case,  the  data  are  the  judgments  made 
by  a  respondent  about  the  similarities  or  differences  between 
stimuli  presented  in  pairs.   For  N  stimuli,  multidimensional 
scaling  attempts  to  find  N  points  in  a  t  dimensional  mapping 
whose  interpoint  distances  (N(N-l)/2  of  them  in  all)  somehow 
resemble  or  match  the  corresponding  N(N-l)/2  similarity- 
dissimilarity  judgments  made  by  the  respondent. 

The  importance  of  the  number  t  stems  from  its  interpre- 
tation as  the  number  of  dimensions  on  which  the  respondent 
based  his  judgments.   The  best  method  for  determining  this 
number  when  the  investigator  is  using  the  multidimensional 
scaling  techniques  to  be  discussed  in  this  paper  has  been 
given  by  Joseph  B.  Kruskal.   (Kruskal,  1964a)   His  method 
assumes  the  capability  to  derive  a  mapping  for  any  number  of 
dimensions  (one,  two,  three  or  more)  and  then  involves  a 
comparison  of  these  mappings  of  different  dimensionality. 
Since  the  question  of  how  to  derive  a  mapping  for  an  arbitrary 
number  of  dimensions  is  the  main  topic  of  this  paper,  the 
dimensionality  of  the  mapping  which  multidimensional  scaling 
seeks  to  derive  will  be  two  throughout  this  paper.   The 
techniques  for  deriving  a  mapping  are  the  same  whether  the 
dimensionality  is  one,  two,  three  or  more.   Also,  the  mapping 
will  always  be  in  Euclidean  space.   The  contents  of  this  paper 
can  be  adapted  with  very  little  trouble,  however,  to  non- 
Euclidean  spaces  based  on  a  city-block  metric  or  a  Minkowski 
r  metric.  (Kruskal,  1964a) 


The  discussion  can  be  simplified  by  the  use  of  an  example. 
Suppose  one  is  interested  in  identifying  the  dimensions  of 
appeal  of  political  candidates.   What  factors  make  some 
candidates  attractive  to  a  respondent  and  other  candidates 
unattractive?   For  simplicity,  suppose  the  investigator 
examines  the  feelings  of  one  respondent  with  respect  to  four 
political  candidates.   Multidimensional  scaling  would  help 
the  investigator  determine  these  factors  or  dimensions  of 
appeal  by  providing  him  with  a  t  (two  in  this  case)  dimen- 
sional mapping  of  the  candidates.   The  mapping  would  be 
based  on  judgments  made  by  the  respondent  about  the  similar- 
ities or  differences  between  the  candidates  presented  in  pairs. 

One  method  of  eliciting  the  judgments  of  a  respondent 

concerning  the  similarities  or  differences  between  candidates 

presented  in  pairs  is  to  administer  a  simple  questionnaire  to 

him.   A  typical  item  in  such  a  questionnaire  might  resemble 

the  following: 

Please  specify  how  similar  or  how  different  these 
two  individuals  are  in  their  general  appeal  to  you  by 
circling  one  of  the  numbers,  1  through  9.   If  you 
circle  number  1,  it  implies  that  they  are  exactly 
equal  in  their  general  appeal  to  you,  while  if  you 
circle  number  9 ,  it  implies  that  they  are  extremely 
different  in  their  general  appeal  to  you. 

Exactly  Extremely 

Equal  Different 

1.   Lyndon  B.  Johnson       123456789 
Hubert  H.  Humphrey 

If  the  respondent's  feelings  toward  four  candidates  were  to 

be  examined,  he  would  be  asked  the  same  question  about  5  other 

pairs  of  candidates,  making  a  total  of  6  questions  in  all. 

(4(3)/2) 


The  basic  premise  underlying  the  analysis  of  data  from  a 
questionnaire  of  this  kind  is  that  the  numbers  circled  are 
measures  of  psychological  distance,  closeness  or  proximity 
between  stimuli  for  the  respondent.   Shepard  calls  them  prox- 
imity measures.   (Shepard,  1962a)   Here,  however,  they  will 
be  called  psychological  distances.   These  psychological  dis- 
tances will  be  labeled  S^i's,  with  the  i  referring  to  one 
stimulus  and  the  j  referring  to  the  other.   The  investigator 
only  obtains  N(N-l)/2  judgments  from  the  respondent  since  6— 
equals  6 . .  by  assumption,  and  a  special  experimental  design 
is  required  if  6 . .  is  to  have  any  meaning.   (If  the  assumption 
were  dropped  and  the  special  design  employed,  the  method  of 
analysis  would  not  change.)   The  formula  N(N-l)/2  can  be 
obtained  by  counting  the  elements  in  the  lower  triangular 
portion  of  an  N  by  N  matrix  or  by  using  the  formula  for  the 
number  of  combinations  of  N  objects  taken  two  at  a  time,  which 
is  (5*)  or  N(N-l)/2. 

A  number  of  computer-based  procedures  for  doing  multi- 
dimensional scaling  are  currently  available.   (Shepard  1962, 
Kruskal  1964,  Lingoes  1965)   However,  the  discussion  in  this 
paper  will  be  limited  to  the  most  popular  of  these,  the  pro- 
cedure proposed  by  Joseph  B.  Kruskal  in  1964.   In  addition  to 
being  the  most  widely  used,  Kruskal * s  technique  is  the  best 
vehicle  for  the  introduction  of  a  slightly  different  technique 
in  this  paper.   For  the  most  part,  Kruskal ' s  notation  will  be 
used  in  the  analysis  to  follow. 


Since  the  properties  of  the  6..'s  will  become  important 
later,  it  should  be  noted  that  they  are  measurements  on  an 
ordinal  scale.   In  other  words,  the  investigator  can  say  that 
a  6 ..  of  8  is  greater  than  one  of  5;  however,  the  difference 
between  the  two  (for  example,  3)  is  not  meaningful.   The  latter 
property  accompanies  both  linear  interval  and  ratio  scales, 
but  not  an  ordinal  scale.   To  obtain  interval  proximity 
measures  or  psychological  distances  (6.  .'s),  one  would  need 
an  experimental  model  somewhat  different  from  the  one  outlined 
by  Kruskal  and  used  in  this  paper.   For  example,  interval 
measures  can  be  obtained  by  the  "method  of  multidimensional 
rank  order,"  the  "method  of  complete  triads,"  or  a  number  of 
other  methods.   (Torgerson  1958)   All  of  these  methods  are 
based  on  the  law  of  comparative  judgment.   It  should  be  noted, 
however,  that  even  the  law  of  comparative  judgment  does  not 
yield  <5ji's  that  are  measurements  on  a  ratio  scale,  a  point 
that  will  become  important  later.   (Thurstone  1920) 

As  mentioned  earlier,  the  investigator  has  obtained 
N(N-l)/2  distance  judgments  from  the  respondent.   Let  M  equal 
N(N-l)/2.   These  psychological  distances,  ^--'s,  have  a 
certain  rank  order: 

«ii   <  «i  i   < <  «i  j   <  •  •  •  <  «i  j   • 

11D1     2D2  mm  MJM 

For  example,  a  respondent  might  provide  the  following  answers 

to  a  four  candidate  questionnaire: 

612=6  «14=9  634=1 

613=8  624=7  623=2 


This  would  mean  that 

634<(523<<S12<624<(S13<(S14 
Multidimensional  scaling  seeks  to  obtain  a  two  (or  t) 
dimensional  mapping,  called  a  configuration,  of  the  stimuli 
for  which  the  Euclidean  (or  non-Euclidean  if  they  are  desired) 
distances  between  the  stimuli  have  the  same  rank  order  as  the 
psychological  distances,  or  S^-'s.   This  is  the  isomorphism 
which  multidimensional  scaling  seeks  to  create  between  the 
psychological  distances  or  proximity  measures  and  the  inter- 
point  distances  in  a  Euclidean  mapping.   Let  X.  be  a  two 

dimensional  vector,  x.,  and  x.  ,  referring  to  the  ith  political 

ll      i2 

candidate's  position  in  the  two  dimensional  mapping  in  Eucli- 
dean space.   The  Euclidean  distance  between  the  two  candidates, 
i  and  j ,  is  the  square  root  of  the  sum  of  squares  of  the 
distances  along  each  axis,  or  by  the  Pythagorean  theorem, 

dij =  ^it-*jt>2 

In  the  four  candidate  example,  the  investigator  would 
want  to  find  a  two  dimensional  mapping  of  the  candidates  for 
which  <^34_<d23£d^2£^24i.^l3£^14  •   Tne  on^-Y    fixed  characteristics 
of  the  mapping  are  the  relationships  between  the  d- -'s.   The 
axes  can  be  rotated  in  any  direction  and  the  origin  placed 
anywhere.   Kruskal  places  the  origin  at  the  centroid  of  the 
configuration  and  normalizes  the  configuration  by  making  the 
sum  of  the  squared  distances  of  the  points  from  the  origin 


equal  one.   Finally,  he  "normalizes  the  angular  attitude  of 
the  configuration  by  rotating  it  so  that  its  so-called 
principal  axes  coincide  with  the  coordinate  axes  (in  the 
natural  order)."1   The  principal  axes  rotation  is  very  impor- 
tant in  the  achievement  of  a  solution  for  a  different  multi- 
dimensional scaling  technique,  that  of  Roger  N.  Shepard.2 
However,  it  is  not  important  for  solution  purposes  in  the 
Kruskal  technique,  although  it  might  help  the  investigator 
in  the  interpretation  of  his  results. 

Of  course  not  all  configurations  of  the  points  (particular 
mappings  of  the  candidates)  will  yield  d- • 's  that  have  the 
same  rank  order  as  the  fij^'s.   Consequently,  what  the  investi- 
gator needs  and  what  Kruskal  provides  is  an  index  to  determine 
how  close  a  given  configuration  comes  to  satisfying  the  rank 

order  requirements  which  the  6- -'s  place  on  the  d- -'s.   This 

J  J 

index  is  called  stress. 

Prior  to  defining  stress,  Kruskal  introduces  a  new  set 
of  symbols,  called  d. .'s.   The  d. .'s  are  numbers  which  com- 
pletely satisfy  the  rank  order  requirements  given  by  the  5^^'s. 
If  the  d. .'s  themselves  satisfy  these  requirements,  then  the 


1 Kruskal,  Joseph  B. ,   "Nonmetric  Multidimensional  Scaling: 
A  Numerical  Method,"   Psychometrika ,   v.  29,  p.  120,   June  1964 

2Shepard,  Roger  N. ,   "The  Analysis  of  Proximities:  Multi- 
dimensional Scaling  With  an  Unknown  Distance  Function," 
Psychometrika ,   v.  27,  p.  132,   June  1962. 
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set  of  dji ' s  could  be,  and  in  fact  will  be,  identical  to  the 
set  of  d- .'s.   However,  consider  the  following  situation.   The 
6-  -'s  are  in  the  order  specified  in  the  example  used  earlier, 

634<623<(S12<<S24<<S13<(S14  , 

and  the  mapping  that  has  been  obtained  has  the  following 

d.  .  's: 
ID 

d34=2  d12=3  d13=7 

d23=l  d24=4  d14=6 

The  rank  order  of  the  d^. 's  is  the  following: 

d23ld34ld12ld24ld14ld13 

A  set  of  numbers,  d-j's,  that  satisfy  the  rank  order  constraints 

set  by  the  6. .'s  can  be  obtained  in  the  following  way: 

/\         /\  /\ 

d34=d23=(d34+d23)/2=1-5  d24=d24 

/\  /\        +. 

d12=d12  d13=d14=(d13+d14)/2=6.5      , 

so  that   d34<d23<d12<d24<d13<d14. 

This  example  demonstrates  that  the  d- .'s  are  based  on 
averages  of  certain  d^-j's.   In  the  example,  so-called  "equality 
blocks"  (for  lack  of  a  better  name)  were  created  for  d34  and 
d23  and  for  d,~  and  d-^4  by  averaging  d34  and  d23  to  find 

d-34  (=d?  )  and  averaging  d-^3  and  d-,^  to  find  d1-(=d-,  J  .   The 

method  of  calculation  of  cL-i's  for  every  situation  is  part  of 
a  technique  called  "monotone  regression."   (Miles  1959) 
Monotone  regression  is  not  discussed  in  any  detail  in  this 
paper.   However,  one  of  its  properties  is  that  the  differences 
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between  the  d- -'s  and  the  d- -'s  computed  in  the  example  repre- 
sent  the  minimum  differences  between  the  distances,  the  d. -'s, 
and  any  set  of  numbers  satisfying  the  rank  ordering  specified 
by  the  6^^ ' s . 

In  the  above  paragraph  the  point  was  made  that  if  the 
d.  .'s  do  not  satisfy  the  rank  order  constraints,  the  d-j  -;  *  s 
will  be  averages  of  certain  d^-'s,  as  seen  in  the  example. 
If  the  problem  has  M  distances,  then  it  can  be  shown  that 
there  are  2M-1-1  possible  ways  to  average  the  d. .'s  to  obtain 
d—'s;  or,  if  the  case  under  which  each  dj^  equals  its 
respective  dj^  is  considered  to  be  a  degenerate  type  of 
averaging,  then  2M~-'-  possible  ways  exist.3 

Another  example  may  help.   Suppose  the  investigator  is 
dealing  with  three  stimuli  and  consequently  with  three 
distances:   di2,  d13'  d23*   T^e  psychological  distances  are  in 
the  following  order:   <5;i_2<<5l3<<$23  •   There  are  23_1  or  4 
different  ways  to  average  d^-'s  to  obtain  d^j's.   First  of  all, 
each  d^  •  may  be  equal  to  its  respective  d^ j ,  or 

d12=d12 
(1)  d13=d13 

d23=d23 


3The  proof  of  this  statement  is  a  lengthy  one  that  must 
be  performed  inductively.   Since  the  number  2M~-^  is  not  crucial 
to  this  analysis,  the  proof  will  not  be  given  here. 


12 


Another  possibility  is  that 

d12=d13=(d12+d13)/2 


(2) 


A  third  is  that 


d23~d23 


d12=d12 


(3) 


d13=d23=(d13+d23)/2 


The  final  possibility  is  that 

A        A        /S 

(4)  d12=d13=d23=(d12+d13+d23)/3 

Monotone  regression  would  lead  to  one  of  the  four  specifi- 
cations, depending  on  the  order  of  the  d^-'s  obtained  from  a 
particular  mapping.   For  example,  given  that  <5i2<(^l3<^23 ' 
the  second  specification  would  be  appropriate  if 

d12>d13 
d12ld23 

d13<d23 
Each  of  the  four  specifications  will  be  called  a  block 
equality  system.   In  the  fourth  specification,  the  block 
equality  is  d^2=d13=d23 '  ky  definition.   In  the  third,  d-.^ 
equals  d23  by  definition,  while  in  the  second  specification, 

A  /\ 

d^  equals  d23  by  definition.   There  are  no  defined  equalities 
in  the  first  specification. 

Now  that  the  method  of  obtaining  the  dji's  from  the  d^-i's 
has  been  outlined  and  the  concept  of  a  block  equality  system 
as  a  defined  equality  between  d. .'s  has  been  introduced, 
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stress  can  be  defined: 

"H 


Z   (d.    -d.    ,- 

m=l    mm   mm 
Stre  M 

Ed-.2 
m=l  mm 

The  heart  of  Kruskal's  technique  is  the  derivation  of 

the  points  (the  X's)  in  the  mapping,  and  subsequently  their 

distances.   Nonlinear  programming  becomes  relevant  at  this 

point  since  the  problem  is  to  find  the  points  and  their 

distances  that  do  the  following: 

Minimize   Stress 

Subject  to: 

d.  .  <_  d.     .      <    .  .  .  £  d.  .   <  .  .  .  <  d.  • 
^l    12:2  mDm  MJM 

Kruskal  employs  the  "method  of  steepest  descent"  to 
solve  this  problem.   (Kruskal  1964b)   His  use  of  this  method 
implies  that  he  is  treating  the  minimization  as  an  uncon- 
strained one,  since  this  method  is  generally  employed  in 
unconstrained  minimization  problems.   (Spang  1962)   As  Spang 
points  out,  the  use  of  the  "method  of  steepest  descent"  for 
constrained  minimization  problems,  which  Kruskal  in  fact  does, 
requires  the  construction  of  a  Lagrangian  and  then  the  uncon- 
strained minimization  of  the  Lagrangian.   Kruskal  uses  the 
"method  of  steepest  descent"  but  mentions  neither  Lagrangians 
nor  the  convexity  assumptions  that  are  normally  made  when 
minimizing  a  Lagrangian.   It  should  also  be  noted,  in  passing, 
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that  the  formula  for  the  gradient  on  page  125  (Kruskal  1964) 
is  incorrect  since  it  fails  to  take  into  account  the  fact 

A 

that  the  d- • 's  change  as  the  d. .'s  change. 

A  more  conventional  nonlinear  programming  approach  to 
this  problem  shows  that  Kruskal ' s  technique  actually  derives 
a  solution  to  one  of  2M~1  different  constrained  minimizations 
(nonlinear  programming  problems) .   There  is  one  nonlinear 
programming  problem  for  each  different  block  equality  system 
or  definition  of  the  d-  •  '  s .  **   The  "method  of  steepest  descent" 
leads  to  the  solution  to  one  of  these  2    different  problems. 
However,  the  "method"  by  itself  cannot  determine  if  the  solu- 
tion  to  another  of  these  2M-1  problems  (where  the  d- -'s  are 
defined  differently)  would  have  a  lower  stress  value  than  the 
one  which  it  has  derived.   There  could  be  2-1  other  minima, 
under  different  block  equality  systems,  that  are  smaller  than 
the  one  yielded  by  the  "method  of  steepest  descent."   (Of 


k Define,  a  priori ,  the  relationship  between  the  d^-'s  and 
and  the  d-ji's;  then  stress  becomes  a  function  of  the  d-j^'s 
alone  (the  X's  or  points^in  the  mapping,  ultimately).   The  con- 
straints represented  by  d-  .  <d-  -;   .  .  .<d-  j   .  ...<cL  ■   are 
^  ^l-  x232    -  ^itOm  -   1M3M 

really  constraints^on  the  d^ 's,  since  the  relationship  between 
the  d-ji  '  s  and  the  d^-j's  has  been  specified  beforehand.   The 
problem  of  minimizing  stress,  then,  has  been  transformed  into  a 
conventional  nonlinear  programming  problem.   However,  there 
£xist   2M~-L  possible  relationships  between  the  dj^'s  and  the 
d^-'s  (block  equality  systems)  and,  consequently,  2M~1  nonlinear 
programming  problems.   Of  course,  some  problems  may  not  have 
solutions  since  in  some  cases  the  constraints  may  imply  a  feas- 
ible region  that  is  the  null  set.   Spang  (1962)  contains  a 
discussion  of  many  techniques  that  could  be  employed  to  solve 
these  constrained  minimizations.   Since  the  publication  of  that 
article,  other  techniques  have  been  developed  that  might  prove 
helpful.   (Klingman  1963,  Glass  and  Cooper  1965,  Box  1965) 
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course,  there  could  also  be  none,)   A  different  approach  to 
the  problem  that  would  bypass  the  above  difficulty  might  be 
possible.5   However,  as  will  be  shown  in  the  next  part,  in 
most  cases,  stress  is  not  a  good  index  of  goodness  of  fit  in 
the  first  place.   Even  if  a  better  minimization  technique 
were  possible,  it  would  have  no   effect  upon  the  suitability 
of  stress  as  a  measure  of  goodness  of  fit.   Consequently,  the 
next  section  will  be  devoted  to  a  discussion  of  some  of  the 
problems  inherent  in  the  concept  of  stress. 


5There  appears  to  be  a  certain  ordering  to  the  2   ^ 
dif ferent^problems  in  the  sense  that  stress  is  always  lower 
when  the  d-j^'s  are  defined  in  one  way  than  when  they  are 
defined  in^another  way.   For  example,  stress  is  always  lower 
when  each  d-  •  is  defined  to  be  equal  to  its  respective  dj_A 
than  when  the  d^-j's  are  defined  in  any  other  way.   If  this 
ordering  could  be  determined,  then  the  first  nonlinear  pro- 
gramming problem  that  had  a  feasible  solution  would  be  the 
one  having  the  lowest  minimum.   The  ordering  may  only  be  a 
partial  one,  however,  which  would  complicate  things 
considerably. 

16 


II.   STRESS  AS  A  MEASURE  OF  GOODNESS  OF  FIT 

A  number  of  problems  with  Kruskal ' s  goodness  of  fit 
measure,  stress,  become  evident  upon  closer  examination.   One 
of  these  is  the  question  of  the  meaning  of  stress,  which,  in 
turn  is  related  to  the  problem  of  the  specification  of  both 
minimum  and  maximum  possible  values  that  the  index  can  attain. 
Another  problem  is  the  question  of  whether  or  not  stress, 
which  will  be  shown  to  be  dependent  on  ratios  between  the 
d. -'s,  is  an  appropriate  measure  of  the  goodness  of  fit  of 
the  rank  order  of  the  d- -'s  to  the  rank  order  dictated  by 
the  6j .'s.   These  problems  of  interpretation,  maximum  and 
minimum  possible  values  and  appropriateness  of  stress  will  be 
discussed  in  that  order  in  this  second  part. 

The  meaning  of  stress  is  the  first  question  to  be  raised 
in  this  part.   The  square  of  stress  would  appear  to  lend  itself 
to  interpretation  as  the  percentage  of  the  variance  of  the 
dj^'s  not  conforming  to  the  monotonic  (rank  order)  requirements 
set  by  the  6. .'s.   Under  this  interpretation,  stress  itself 
(not  stress  squared)  would  then  be  the  square  root  of  this 
number  or  the  percentage  of  the  standard  deviation  of  the  d^ j ' s 
not  accounted  for  by  the  monotonic  requirements. 

This  interpretation  of  stress  encounters  problems  as 
soon  as  one  examines  the  maximum  and  minimum  possible  values 
of  stress.   Intuitively,  the  minimum  should  be  0.0  and  the 
maximum  1.0.   Intuition  is  only  partially  correct  in  this  case. 

Obviously,  if  the  d^-'s  perfectly  satisfied  the  monotonic 
requirements,  then  stress  would  be  0.0,  since  each  d- •  would 
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equal  its  respective  d- • ,  as  discussed  earlier.   However, 
the  conditions  under  which  stress  would  be  equal  to  1.0, 
which  would  be  the  logical  maximum  under  the  "percentage  of 
variance"  interpretation,  are  not  clearly  defined. 

Stress  would  be  1.0  if  all  d^-'s  were  zero.   However, 
since  the  d- -'s  by  definition  cannot  be  less  than  zero  and  all 
are  not  allowed  to  equal  zero  at  the  same  time,  a  degenerate 

As 

solution  which  Kruskal  disallows,6  all  the  d-  .'s  cannot  equal 

As 

zero.   Similar  problems  arise  if  one  tries  to  make  each  dj_-; 
equal  to  twice  its  respective  d. . ,  the  other  condition  under 
which  stress  would  equal  1.0. 

The  following  is  a  short  proof  that  in  a  particular 
problem  it  is  impossible  for  stress  to  equal  1.0.   (Under  the 
"percentage  of  variance"  interpretation,  it  should  always  be 
possible  for  stress  to  equal  1.0.)   Once  again,  suppose  that 
an  investigator  is  using  only  three  stimuli  and  the  specified 
rank  ordering  of  the  distances  is  as  before: 

612<613<<S23 
As  mentioned  earlier,  there  are  four  possible  block 
equality  systems  for  this  problem: 

d12=d12 

(1)  d13=d13 

As 

d23=d23 


Kruskal,  op.  cit. ,  p.  120. 
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d12=d13=(d12+d13)/2 
d23=d23 


d12=d12 


^    a. 


(3) 

d13=d23=(d13+d23)/2 

^       y\       /\ 

(4)  d12=d13=d23=(d12+d13+d23)/3 

Stress  will  then  equal 

td12~d12)  +(d13~d13>  +(d23~d23> 

2     2     2 

d12  +d13  +d23 

If  the  first  block  equality  system  is  used  and  a  mapping  that 

y\       ^       /^ 

allows  d^2£dl3£d23  ^s  obtained,  stress  would  equal  0.0.   Con- 
sequently, if  stress  were  to  equal  1.0,  it  would  do  so  under 
one  of  the  other  three  block  equality  systems.   Assume  that 
stress  can  equal  1.0  under  block  equality  system  number  two. 
Then  the  following  relationships  must  hold: 

(d12- (d12+d13)/2) 2+ (d13- (d12+d13>/2) 2+ (d13-d13' \        .   1#„ 

*    2   _    2   ,    2 

d12  +d13  +d23 
^((d12-d13)2+(d13-d12)2  =  d122+d132+d232 
2d122-4d12d13+2d132  =  4d122+4d132+4d232 

-4d12d13  =  2d122+2d132+4d232 

The  last  relationship  is  an  obvious  contradiction  since 
the  right  side  of  the  equation  must  be  greater  than  the  left 
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side  (d- ->0)  unless  all  d- . 's  are  equal  to  zero  (which  is  not 

-L  J  1  J 

allowed) .   The  same  type  of  contradiction  arises  when  block 
equality  system  number  three  is  employed.   Under  block  equality 
system  number  four,  the  following  "illegal"  statement  is 
obtained: 

"2d12d13~2d12d23"2d13d23=d12  +d132+d232 
The  lack  of  a  clearly  defined  maximum  of  1.0  for  stress 
makes  a  "percentage  of  variance"  interpretation  difficult  at 
best.   However,  these  problems  are  not  nearly  as  serious  as 
those  associated  with  the  question  of  the  appropriateness  of 
stress  as  a  measure  of  goodness  of  fit.   This  question  is  very 
closely  related  to  the  discussion  in  the  first  part  about 
levels  of  measurement.   The  reader  may  want  to  refer  to  that 
section  at  this  time. 

Euclidean  distances  are  numbers  on  a  ratio  scale.   Both 
the  differences  between  two  distances  and  the  ratios  between 
two  distances  are  meaningful.   As  mentioned  earlier,  the 
proximity  measures  or  psychological  distances  will  normally 
be  measurements  on  an  ordinal  scale,  although  they  may  be 
measurements  on  a  linear  interval  scale  if  the  law  of  com- 
parative judgment  is  invoked.   Whether  the  psychological 
distances  are  ordinal  or  interval  measurements,  the  ratios 
between  two  ^.j-j's  are  not  meaningful. 

The  major  problem  with  stress  as  a  measure  of  goodness 
of  fit  of  the  dji ' s  to  the  Sji's  is  that  it  depends  on  the 
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ratio  properties  of  the  d. .'s.   The  following  example  will 
demonstrate  that  stress  can  be  reduced  to  a  function  of  the 
ratios  between  d-  .'s. 

The  same  three  candidate  example  will  be  used.   Assume, 
once  again  that  the  respondent  has  specified  that  <5i2<(^13<(->23 
Further,  suppose  a  mapping  which  has  the  following  distance 
relationships  has  been  obtained: 

d12>d13 
d12±d23 
d13<d23 

•V       /v       /s 

In  order  to  insure  that  d^2_<dl3<d23 '  block  equality 
system  number  two  must  be  used,  or: 

d12=d13=(d12+d13)/2 


d23~d23 


Under  these  conditions, 


(d12- (d12+d13) ) 2+ (d13- (d12+d13) ) 2 


or 


Stress   = 

d12   +d13   +d23 


( (d12-d13)/2) 2+ ( (d13-d12)/2) 2 


Stress^  = 


d12   +d132  +d232 


Let  djo/d-jo   ec3u^l    K  and   let   d23/d]?   equal   G.      Then, 
Stress2   =  £2-2K+l 


2+2K2  +2G2 
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or  stress  is  entirely  dependent  on  the  ratios  d23/d12  an<3 
dl3/d12- 

The  problem  is  obvious.   If  the  investigator  is  using 
ordinal  5 j -i '  s,    stress  is  supposedly  a  measure  of  how  well  the 
d^-'s  fit  the  rank  ordering  specified  by  the  respondent,  that 
is  the  rank  ordering  of  the  6^j's.   This  would  seem  to  indicate 
that  it  should  not  be  dependent  on  a  property  which  the  original 
data  do  not  possess,  that  is  the  property  that  the  ratios  of 

n 

the  distances  are  meaningful.   Notice  the  effect  of  G   in  the 

2 

above  equation.   Stress  decreases  as  G   increases.   (Recall 

that  G  equals  d23/d^2-)   The  respondent  might  have  specified 
that  623=6  and  ^12=^*   W^^  should  one  obtain  a  lower  stress 
value  when  d23=1000  and  d,2=2  than  when  ^23=3  and  d-,2=2? 

When  the  Sjh's  are  interval  measurements ,  that  is  when 
the  law  of  comparative  judgment  has  been  invoked,  the  same 
problem  arises.   Why  should  the  measure  of  goodness  of  fit  be 
dependent  on  the  ratios  between  the  d^'s  when  the  ratios 
between  the  tSj^'s  are  not  meaningful? 

An  example  of  what  can  happen  when  stress  is  used  as  a 
measure  of  goodness  of  fit  may  help.   Suppose  an  investigator 
wants  to  examine  a  4  candidate  situation  and  the  respondent 
specifies  that  <$34<623<<5;L2<<-*24<<'*13<^14  *   First>  suppose  he  has 
obtained  a  mapping  with  the  following  d^-j's, 

d34=l  d12=3  d13=10 

d23=2  d24=4  d14=8 
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Stress  in  this  situation  equals    \]    j^-      .   Now  suppose 
he  obtains  another  mapping  with  the  following  set  of  d — ' s : 
d34=2  d12=3  d13=10 

d23=l  d34=4  d14=9 


For  this  second  mapping,  stress  is  equal  to 


211 

Notice  that  the  first  mapping  violated  the  rank  ordering 
specified  by  the  <5^-'s  only  once  while  the  second  mapping 
violated  the  rank  ordering  twice.   Yet  the  first  had  a  higher 
stress  value  than  the  second 

In  the  next  part,  a  new  index  of  goodness  of  fit  will  be 
proposed  for  the  case  or  ordinal  <5 . .'s,  and  a  very  simple 


me 


thod  of  dealing  with  interval  <5^-'s  will  be  discussed 
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III.   HOW  TO  FIT  EUCLIDEAN  DISTANCES  TO 
PSYCHOLOGICAL  DISTANCES 

In  this  part  a  new  index  of  goodness  of  fit,  V,  will  be 

introduced.   This  index  is  sensitive  to  neither  the  size  of 

the  difference  between  two  d . . ' s  nor  to  the  size  of  the  ratio 

between  two  d. ,'s.   It  also  has  a  well-defined  maximum  of  1.0 
J-D 

and  a  well-defined  minimum  of  0.0.   This  simple  index  is 
based  on  the  number  of  violations  of  the  order  relations 
specified  by  the   ^i-i's*   A  very  simple  method  of  dealing 
with  interval  6. .'s  will  also  be  discussed  in  this  section. 

Earlier,  the  point  was  made  that  if  one  were  attempting 
to  map  N  stimuli  into  Euclidean  space,  then  there  would  be 
N(N-l)/2  psychological  distances,  6.  .'s,  associated  with 
these  stimuli,  and  likewise  N(N-l)/2  Euclidean  distances, 
d^j's,  associated  with  the  mapping.   Again,  let  M=N(N-l)/2. 
The  respondent,  by  his  answers,  specifies  a  rank  ordering  for 

the  6 . . ' s : 
ID 

6.  .   <  6.  .   .  .  .  <  6.  .   <  .  .  .  <  6-  . 
1D1      2D2  Vm  MDM 

The  problem  of  multidimensional  scaling  is  to  find  a 
mapping  of  the  N  stimuli.   The  Euclidean  distances  between 
the  points  (stimuli)  in  the  mapping  should  have,  as  nearly  as 
possible,  the  same  rank  order  as  that  of  the  psychological 
distances.   Implicit  in  this  rank  order  are  M(M-l)/2  constraints 
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d.    <d. 


d-  •  <d-  • 


d-  •  <d-  • 

1,],-!  3 

11   mm 


d.  .  <d.  . 
1l3l   ^m 


d.  .  <d. 

rn^rn       m+l-'m+l 


d.  .  <d.  . 
nrm   MM 


di    j   ldi  j 
M-1JM-1    MJM 

For  any  particular  mapping,  then,  a  possible  index  of 
the  goodness  of  fit  of  the  mapping  to  the  rank  order  constraints 
would  be  the  number  of  violations  of  the  constraints  by  the 
d- . 's.   In  fact,  this  is  the  index  that  will  be  adopted,  except 
for  one  obvious  alteration.   The  number  of  violations  of  the 
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constraints  by  the  d. .'s  should  be  expressed  as  a  percentage 
of  the  maximum  possible  number  of  violations,  or  in  other 
words , 

A 


V  = 


M(M-l)/2 


where  A  is  the  actual  number  of  violations  and  M(M-l)/2  is 
the  maximum  possible  number  of  violations.   If  V  equals  0.0, 
no  violations  occur  and  the  mapping  perfectly  satisfies  the 
rank  order  constraints.   If  V  equals  1.0,  the  mapping  perfectly 
violates  the  rank  order  constraints. 

If  this  new  index  were  adopted,  the  problem  of  multi- 
dimensional scaling  would  become  the  problem  of  finding  a 
configuration  that  minimizes  V.   This  minimization  is  very 
similar  to  a  problem  encountered  in  mathematical  programming, 
the  derivation  of  an  initial  feasible  point.   (Klingman  1963, 
Hilleary  1966,  Rosen  1961)   A  very  popular  technique  for 
finding  a  feasible  point  is  Hooke  and  Jeeves '  direct  search 
algorithm  for  unconstrained  functions.   (Hilleary  1966, 
Klingman  1963,  Hooke  and  Jeeves  1961) 

The   above  references  contain  complete  descriptions  of 
the  direct  search  algorithm.   In  order  to  use  the  algorithm 
to  minimize  V,  one  would  start  with  an  arbitrary  configuration 
of  the  N  points  in  t  dimensions.   The  t  coordinate  values  for 
each  point  would  be  the  independent  variables ,  making  Nt 
independent  variables  in  all.   A  univariate  search  is  first 
performed,  with  each  independent  variable  being  changed  by  a 
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small  amount,  one  at  a  time,  in  order  to  determine  the  direc- 
tion toward  the  minimum.   If  this  exploratory  move  succeeds 
in  lowering  the  objective  function,  a  "pattern  move"  is  then 
attempted.   A  pattern  move  is  a  move  based  on  the  directions 
of  the  last  two  (sometimes  more  than  two)  exploratory  moves. 
Various  modifications  of  the  algorithm  tend  to  differ  with 
respect  to  the  weights  given  to  previously  successful  explora- 
tory moves.   If  the  pattern  move  does  not  succeed  in  lowering 
the  objective  function,  another  exploratory  move  is  attempted. 
Eventually,  the  exploratory  move  will  be  unable  to  lower  the 
objective  function.   In  that  case,  the  step  size  of  the  search 
is  reduced  and  another  search  is  performed.   The  process  is 
repeated  until  the  step  size  reaches  a  predetermined  minimum. 

The  new  index,  V,  vears  a  striking  similarity  to  Kendall's 
tau,  a  commonly  used  rank  correlation  coefficient.   (Kendall 
1962)   In  fact  the  two  indices  can  be  related  by  the  following 
equation: 

tau  =  1.0-2V 
Kendall's  tau  was  not  adopted  as  the  index  of  goodness  of  fit 
for  two  reasons.   First,  certain  characteristics  of  V  are 
similar  to  characteristics  possessed  by  Kruskal's  index,  stress 
For  example,  a  perfect  fit  of  the  d^-i's  to  the  nonmetric  hy- 
pothesis would  yield  a  value  of  0.0  for  both  stress  and  V.   For 
both  indices,  a  low  value  is  interpreted  as  a  good  fit  while  a 
high  value  is  interpreted  as  a  poor  fit.   Of  course,  with 
Kendall's  tau,  a  perfect  fit  would  yield  a  value  of  1.0.   A 


27 


certain  amount  of  consistency  among  indices  of  goodness  of 
fit  seems  desirable,  and  consequently  V  should  be  the  preferred 
index  on  this  basis.   Also,  the  "percentage  of  possible  vio- 
lations" interpretation  of  V  is  intuitively  appealing. 

A  second   reason  for  adopting  V  instead  of  tau  is  based 
on  a  disadvantage  which  both  possess,  but  to  a  different 
extent.   Neither  V  nor  tau  is  a  continuous  variable.   V  is  not 
continuous  since  the  numerator  A,  is  discrete.   In  any  problem, 
the  number  of  violations  can  be  0,1,2,3  and  so  forth  up  to 
M(M-l)/2.   The  difference  between  successive  values  of  A  is  1. 
Kendall's  tau  has  the  same  denominator  as  V;  however,  the 
numerator  is  different.   The  difference  between  successive 
values  of  the  numerator  is  2.   In  other  words,  V  is  on  a  more 
compressed  scale  than  tau.   (The  formula  relating  the  two 
indices  also  demonstrates  this  fact.)   Kendall  has  shown  that 
as  the  denominator  in  the  expression  for  tau  (M(M-l)/2  in  this 
case)  becomes  large,  tau  approximates  a  continuous  variable.7 
In  fact,  he  has  shown  that  for  a  denominator  greater  than  45 
(M  greater  than  10) ,  tau  can  be  considered  to  be  a  continuous 
variable.   The  compressed  scale  of  V  approaches  continuity 
even  faster  than  tau,  since  the  difference  between  successive 
values  for  the  numerator  is  1,  not  2. 

A  necessary  condition  for  continuity  of  the  function 
relating  values  of  the  coordinates  of  the  points  in  a  Euclidean 


7Kendall,  M.  G. ,   Rank  Correlation  Methods,   3rd  ed. , 
p.  69,   Charles  Griffin  and  Company,   1962. 
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mapping  and  V  is  that  V  itself  be  continuous.   Consequently, 
if  the  only  impediment  to  the  continuity  of  the  function  is 
the  lack  of  continuity  of  V  (which  is  the  case  here) ,  then  as 
V  approaches  (at  the  limit)  a  continuous  variable,  the  function 
will  approach  a  continuous  one.   Most  minimization  techniques 
require  the  assumption  of  continuity  of  the  objective  function. 
Consequently,  the  index  which  allows  the  function  to  approach 
a  continuous  one  faster  (V  in  this  case)  should  be  preferred.8 

It  should  be  noted  that  stress  is  a  continuous  function. 
However,  as  mentioned  earlier,  the  minimization  of  stress  is  a 
minimization  under  constraints.   On  the  other  hand,  like  the 
problem  of  finding  a  feasible  point,  the  minimization  of  V  is 
essentially  an  unconstrained  one.9 

Like  Kendall's  tau,  V  is  obviously  not  sensitive  to  the 
magnitude  of  the  difference  between  two  d^j's  nor  to  the  size 
of  their  ratio.   As  mentioned  earlier,  V  has  both  a  clearly 
defined  minimum  of  0.0  and  a  clearly  defined  maximum  of  1.0. 


8 In  all  likelihood,  if  the  Hooke  and  Jeeves  technique  will 
work  with  V  as  the  index  it  will  probably  work  with  tau  as  the 
index.   In  fact,  one  could  probably  adopt  the  Spearman  rho  as 
the  goodness  of  fit  measure  if  he  desired  to  do  so. 

9One  set  of  constraints  is  operative  in  this  problem.   The 
d-^j's  are  not  allowed  to  equal  zero.   These  constraints  can  be 
handled  effectively  by  the  insertion  of  a  penalty  function  into 
the  Hooke  and  Jeeves  algorithm.   This  function  would  automatic- 
ally set  the  value  of  V  equal  to  1.0  when  a  configuration  with 
one  or  more  zero  distances  is  tested. 
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Both  of  these  properties  contrast  markedly  with  the  properties 
of  stress,  which  is  dependent  on  the  ratios  between  the  d-^'s 
and  does  not  have  a  clearly  defined  maximum  of  1.0.   Also, 
Kendall  has  proposed  a  very  simple  way  of  dealing  with  ties.10 
His  method  can  be  used  in  the  computation  of  V,  in  the  event 
that  certain  (S^-'s  or  certain  d^-'s  are  equal. 

If  the  Hooke  and  Jeeves'  technique,  or  some  other  algorithm 
will  in  fact  minimize  V,  the  index  would  appear  to  have  another 
desirable  property  that  Kruskal ' s  stress  does  not  clearly 
possess.   Suppose  an  investigator  were  to  obtain  a  mapping  of 
14  political  candidates  that  minimized  V.   For  interpretation 
purposes,  he  might  want  to  examine  the  constraints  that  were 
violated.   It  might  happen  that  a  large  portion  of  the  viola- 
tions (if  not  all  of  them)  involved  a  particular  stimulus, 
candidate  number  1,  for  example.   (That  is,  d—  is  not  less 
that  d34  as  it  should  be;  neither  are  d]^,  d-j^  and  so  forth.) 
This  kind  of  result  might  indicate  that  the  respondent  based  his 
judgments  about  candidates  2  through  14  on  two  dimensions  (for 
example,  liberalism  and  good  looks)  while  randomly  making 
judgments  about  candidate  1,  or  making  them  on  the  basis  of 
something  other  than  the  dimensions  of  liberalism  and  good  looks, 
This  might  be  very  important  to  an  investigator  who  is  trying 
to  interpret  the  dimensions. 


10Kendall,   op.  cit. ,   pp.  34-48. 
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One  can  imagine  instances  when  there  might  be  a  number 
of  different  mappings  that  will  yield  the  same  minimum  V 
value,  but  with  different  constraints  being  violated.   It  would 
appear  to  be  possible  to  eliminate  at  least  some,  if  not  all, 
of  these  mappings  from  consideration  by  choosing  the  one(s) 
with  the  smallest  number  of  stimuli  involved  in  violations.   In 
fact,  it  may  even  be  possible  to  insert  this  criterion  into 
the  minimization  problem. 

Now  that  a  new  index  of  goodness  of  fit  when  the  <5ji's 
are  ordinal  measures  has  been  derived  and  discussed,  it  should 
be  clear  what  kind  of  index  ought  to  be  used  when  the  6j_-;ls 
are  interval  measures.   The  easiest  index  to  use  would  probably 
be  the  Pearson  r.   The  problem  would  be  one  of  seeking  the 
maximum  r  between  the  <5ij's  and  the  d^-'s,  or  the  minimum 
negative  of  r.   Again,  the  minimization  would  be  an  uncon- 
strained one.   A  direct  search  technique  could  again  be  used. 
The  "optimum  gradient"  method  or  one  of  the  other  gradient  tech- 
niques discussed  in  Spang  (1962)  might  be  more  efficient  than 
the  Hooke  and  Jeeves  technique  in  this  case,  however.   Since 
r  is  continuous,  the  continuity  problems  inherent  in  the  use 
of  V  or  tau  do  not  arise  in  this  case. 
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SUMMARY 
The  formulation  of  the  new  measure  of  goodness  of  fit, 
V,  and  the  discussion  of  the  use  of  the  Pearson  r  for  interval 
data  complete  the  line  of  argument  followed  in  this  paper.   In 
the  first  part,  multidimensional  scaling  was  defined.   The 
ordinal  or,  under  certain  conditions,  interval  nature  of  the 
data  which  are  used  as  input  into  most  multidemensional  scaling 
techniques  was  discussed.   Finally,  one  approach  to  multi- 
dimensional scaling,  that  of  Joseph  B.  Kruskal,  was  discussed 
in  detail.   The  second  part  highlighted  three  problems  with 
Kruskal' s  measure  of  goodness  of  fit.   These  were  the  problems 
of  interpretation,  a  well-defined  maximum  possible  value  and 
appropriateness  of  the  index.   Most  of  the  discussion  in  the 
third  and  final  part  was  concerned  with  a  new  measure  of 
goodness  of  fit  when  the  data  are  measurements  on  an  ordinal 
scale.   The  relationship  between  V  and  Kendall's  tau  showed  that 

V  is  essentially  a  measure  of  the  rank  correlation  between  the 
distances,  d^-j's,  implicit  in  a  particular  mapping  of  the 
stimuli,  and  the  psychological  distances  or  (S-m's  which  an 
investigator  obtains  from  a  respondent.   A  method  of  minimizing 

V  was  suggested  in  this  part  and  the  problem  of  continuity  was 
discussed.   Among  the  desirable  properties  which  V  possesses 
are  ease  of  interpretation,  clearly  defined  maximum  and  minimum 
possible  values  and  insensitivity  to  properties  of  the  d^^'s 
which  the  <$ji's  do  not  possess.   Finally,  a  straightforward 
extension  of  the  use  of  the  rank  correlation  between  the  d-'s 
and  <5ji's  was  proposed  for  the  case  when  the  6^^'s  are 
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measurements  on  an  interval  scale,  namely  the  Pearson  r  or 
linear  correlation  coefficient.   The  next  task  to  be  performed, 
in  a  subsequent  analysis,  is,  of  course,  the  programming  of  a 
technique  for  minimizing  V.   After  a  routine  is  implemented, 
the  output  should  be  systematically  compared  to  output  from 
Kruskal ' s  routine.   Once  this  task  has  been  completed,  the 
distribution  of  V  under  various  conditions  should  be  examined 
carefully.   As  David  Klahr  has  pointed  out  (Klahr  1969) ,  this 
type  of  analysis  alone  will  allow  the  investigator  to  make 
probability  statements  about  the  goodness  of  fit  value  which 
he  has  obtained. 
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